Clustering is a difficult topic to address because unlike regression, classification, etc. there isn't an obvious goal. Typically it's an exploratory technique.
In this lab, we'll be clustering pixels in images. People have different goals when they do this.
Pixel clustering is illustrative of several important phenomena.
Much more so than prediction [regression or classification] methods, clustering is sensitive to the features you use and the way that they're scaled. That's because when we fit predictive models, the parameters we estimate adapt to the scale of the features relevant to the prediction task. In clustering, there's no clear task to adapt to. We get what we put in.
We'll use finite mixtures of gaussians to cluster.
To explore the clustering in 3-dimensional [RGB] color space, we'll be using interactive 3D Plots generated by the package plotly. If you're not able to install the package, the distributed HTML version, which I've generated simply by saving this notebook, includes saved versions of these plots which are still interactive.
from scipy import misc
import numpy as n
import sklearn.mixture
import matplotlib.pyplot as plt
from plotly import offline
import plotly.graph_objs as go
%matplotlib inline
plt.rcParams['figure.figsize'] = [20, 10]
offline.init_notebook_mode()
In the plots generated by the following cell, we cluster pixels by color as 3-dimensional RGB vectors. We use Gaussian mixtures, allow each cluster to have its own $3 \times 3$ covariance matrix. We cluster at multiple scales: 2,4,10, and 30 clusters. In the left column, we see the original image. In the center, the quantized version where each pixel in a cluster is assigned one color, the cluster center. On the right, we randomly re-assign colors to each cluster to get a sense of what position we'd be in if we ignored similarity between clusters.
def as_pixels(im):
return im.reshape((im.shape[0]*im.shape[1], im.shape[2]))
def as_image_like(labels, im):
return labels.reshape((im.shape[0],im.shape[1]) + labels.shape[1:])
def quantize(im, mixture, permute_colors=False):
labels = mixture.predict(as_pixels(im))
means = mixture.means_[:,:3].astype('uint8')
if permute_colors:
means = means[n.random.permutation(mixture.n_components), :]
return as_image_like(means[labels,:], im)
n.random.seed(0)
im=misc.imread('bird.jpg')
for k in [2,4,10,30]:
mixture = sklearn.mixture.GaussianMixture(k).fit(as_pixels(im))
plt.figure()
plt.title(f'{mixture.n_components} clusters. image (left), quantized colors (center), random colors (right)')
plt.subplot(1,3,1); plt.imshow(im)
plt.subplot(1,3,2); plt.imshow(quantize(im, mixture))
plt.subplot(1,3,3); plt.imshow(quantize(im, mixture, permute_colors=True))
To explore the clustering in more detail, we will plot the points in 3-dimensional RGB space. We'll use 4 clusters. There are six plots.
Plots 4-6 are like (3) but for clusters 2-4.
You can rotate by clicking and dragging and zoom by scrolling while hovering over the plot. To scroll through the plots, scroll with your pointer to the right of the plots.
def scatter(pixels, labels, mixture, whiten_cluster=None):
if n.all(labels==labels[0]):
colors=n.array(["rgb(%f,%f,%f)" % tuple(p) for p in pixels.astype('uint8')])
else:
cluster_colors=n.array(["rgb(%f,%f,%f)" % tuple(m) for m in mixture.means_.astype('uint8')])
colors = cluster_colors[labels]
if whiten_cluster is None:
cov = n.eye(3)
elif whiten_cluster == 'pooled':
cov = n.cov(pixels.T)
elif mixture.covariance_type == 'full':
cov = mixture.covariances_[whiten_cluster]
elif mixture.covariance_type == 'tied':
cov = mixture.covariances_
elif mixture.covariance_type == 'diag':
cov = n.diag(mixture.covariances_[whiten_cluster])
elif mixture.covariance_type == 'spherical':
cov = n.eye(3)*mixture.covariances_[whiten_cluster]
whitening_transformation = n.linalg.cholesky(n.linalg.inv(cov))
white_px = n.dot(pixels, whitening_transformation)
return go.Scatter3d(x=white_px[:,0], y=white_px[:,1], z=white_px[:,2],
mode='markers', marker=dict(size=4, color=colors))
im = misc.imread('bird.jpg')
mixture = sklearn.mixture.GaussianMixture(4).fit(as_pixels(im))
def plot_components(mixture, im, thin_to=5000):
pixels = as_pixels(im)
labels = mixture.predict(pixels)
n.random.seed(0)
thin = n.random.permutation(pixels.shape[0]) < thin_to
fig = go.Figure(data=[scatter(pixels[thin,:], labels[thin], mixture, whiten_cluster=None)]); offline.iplot(fig)
fig = go.Figure(data=[scatter(pixels[thin,:], labels[thin], mixture, whiten_cluster='pooled')]); offline.iplot(fig)
for cc in range(mixture.n_components):
thin = n.nonzero(labels==cc)[0]
n.random.shuffle(thin)
thin=thin[:thin_to]
fig = go.Figure(data=[scatter(pixels[thin,:],
labels[thin], mixture, whiten_cluster=cc)])
offline.iplot(fig)
mixture = sklearn.mixture.GaussianMixture(4, covariance_type='full').fit(as_pixels(im))
plot_components(mixture, im)
In the next two cells, we repeat the example above, but restricting the mixture of gaussians we fit to have spherical [scaled identity] covariance matrices. You can see in the unwhitened plot (1) above, the actual covariances are far from spherical. Because the model is a poor fit, the clusters are more heterogeneous than those above.
im=misc.imread('bird.jpg')
for k in [2,4,10,30]:
mixture = sklearn.mixture.GaussianMixture(k, covariance_type='spherical').fit(as_pixels(im))
plt.figure()
plt.title(f'{mixture.n_components} clusters. image (left), quantized colors (center), random colors (right)')
plt.subplot(1,3,1); plt.imshow(im)
plt.subplot(1,3,2); plt.imshow(quantize(im, mixture))
plt.subplot(1,3,3); plt.imshow(quantize(im, mixture, permute_colors=True))
mixture = sklearn.mixture.GaussianMixture(4, covariance_type='spherical').fit(as_pixels(im))
plot_components(mixture, im)
The bird image above was fairly simple. It had sky, a tree, a bird, and some shadow around the edges. Here we're going to repeat our analysis [using full covariate structure] on a new image.
Here the lighting tends to make different areas of objects, even if they are the same material, have different colors in the image. Clustering using color alone doesn't seem to identify objects in the image. It works best in regions of the image where the color changes in lighting are slow, like the sky and grass.
im=misc.imread('bridge.jpg')
for k in [2,4,10,30]:
mixture = sklearn.mixture.GaussianMixture(k).fit(as_pixels(im))
plt.figure()
plt.title(f'{mixture.n_components} clusters. image (left), quantized colors (center), random colors (right)')
plt.subplot(1,3,1); plt.imshow(im)
plt.subplot(1,3,2); plt.imshow(quantize(im, mixture))
plt.subplot(1,3,3); plt.imshow(quantize(im, mixture, permute_colors=True))
im = misc.imread('bridge.jpg')
mixture = sklearn.mixture.GaussianMixture(4).fit(as_pixels(im))
plot_components(mixture, im)
To improve the clustering, we will include pixel location in the set of features we use to cluster them. Each pixel is represented as a 5-dimensional vector [X,Y,R,G,B].
As you might expect, we get more spacial continuity in our clusters. However, this introduces artifacts as well. In some cases, objects of similar colors are broken up simply because they're big and their XY coordinates have changed a lot.
bridge = misc.imread('bridge.jpg')
im=bridge
grid = n.mgrid[:im.shape[0], :im.shape[1]].transpose([1,2,0])
feature_im = n.concatenate((im,grid),axis=2)
for k in [2,4,10,30]:
mixture = sklearn.mixture.GaussianMixture(k, covariance_type='full').fit(as_pixels(feature_im))
plt.figure()
plt.title(f'{mixture.n_components} clusters. image (left), quantized colors (center), random colors (right)')
plt.subplot(1,3,1); plt.imshow(im)
plt.subplot(1,3,2); plt.imshow(quantize(feature_im, mixture))
plt.subplot(1,3,3); plt.imshow(quantize(feature_im, mixture, permute_colors=True))
We repeat this using Gaussian mixtures with spherical covariance. This doesn't work as well. Large and consistently colored objects like the sky are broken up. When we used mixtures with full covariance previously, the Gaussians we fit were elongated in the directions in which the X,Y coordinates of large objects like the sky varied. If we assume spherical covariance, this isn't allowed to happen.
bridge = misc.imread('bridge.jpg')
im=bridge
grid = n.mgrid[:im.shape[0], :im.shape[1]].transpose([1,2,0])
feature_im = n.concatenate((im,grid),axis=2)
for k in [2,4,10,30]:
mixture = sklearn.mixture.GaussianMixture(k, covariance_type="spherical").fit(as_pixels(feature_im))
plt.figure()
plt.title(f'{mixture.n_components} clusters. image (left), quantized colors (center), random colors (right)')
plt.subplot(1,3,1); plt.imshow(im)
plt.subplot(1,3,2); plt.imshow(quantize(feature_im, mixture))
plt.subplot(1,3,3); plt.imshow(quantize(feature_im, mixture, permute_colors=True))
We can encode pixel location in more than one way. Next, instead of augmenting the RGB coordinates with XY coordinates, we augment them with an 6-dimensional vector that is informative about location: the distances to 6 random locations in the image. As currently tuned, it doesn't seem to work terribly well.
bridge = misc.imread('bridge.jpg')
im=bridge
grid = n.mgrid[:im.shape[0], :im.shape[1]].transpose([1,2,0])
n.random.seed(0)
random_i=n.random.permutation(im.shape[0])[1:6]
random_j=n.random.permutation(im.shape[1])[1:6]
distances = [n.atleast_3d(n.sqrt((grid[:,:,0]-ii)**2 + (grid[:,:,1]-jj)**2))/5000 for (ii,jj) in zip(random_i, random_j)]
feature_im = n.concatenate([im] + distances, axis=2)
for k in [2,4,10,30]:
mixture = sklearn.mixture.GaussianMixture(k).fit(as_pixels(feature_im))
plt.figure()
plt.title(f'{mixture.n_components} clusters. image (left), quantized colors (center), random colors (right)')
plt.subplot(1,3,1); plt.imshow(im)
plt.subplot(1,3,2); plt.imshow(quantize(feature_im, mixture))
plt.subplot(1,3,3); plt.imshow(quantize(feature_im, mixture, permute_colors=True))
Exercise 1. Can you find and tune a set of features resulting in a good clustering of pixels in this image? Consider
Exercise 2. See what happens when you use another image. You should be able to substitute in the code the filename of an image in any standard format.